Feat/newstats by lex0c · Pull Request #2 · lex0c/gitcortex

lex0c · 2026-04-18T19:50:51Z

No description provided.

DevProfile.Specialization is a Herfindahl concentration index (Σ pᵢ²) over the dev's full per-directory file-count distribution. 1 = all files in one directory (narrow specialist); 1/N for a uniform spread across N directories; approaches 0 for broad generalists. Surfaces a question the current profile cannot answer: "is this person a specialist or a generalist?". DevProfile.Scope already shows the top 5 directories with percentages, but reading breadth from visual percentages is subjective. One number makes the distinction crisp. Why Herfindahl, not Gini: Gini measures inequality among buckets, not concentration. A dev with 1 file in 1 directory and a dev with 1 file in each of 5 directories both get Gini 0, collapsing the two opposite ends of the spectrum. Herfindahl correctly separates them (1 vs 0.2). An initial implementation used Gini; the review caught the semantic error before the feature shipped and it was rewritten before the first commit. Four labels applied via thresholds in named constants: H < 0.15 broad generalist H < 0.35 balanced H < 0.7 focused specialist H ≥ 0.7 narrow specialist Implementation: - stats.go adds herfindahl() helper and specBroadGeneralistMax / specBalancedMax / specFocusedMax constants. - DevProfiles computes Specialization over the full dirCount map (before the top-5 Scope truncation, so it reflects actual breadth not the truncated display). - format.go specLabel() maps the Herfindahl value to a band. - template.go embeds the value in the profile card grid. - profile_template.go surfaces it inline in the Scope header of the standalone per-developer report. Tests (17 cases across 4 functions): - TestHerfindahlHelper: 9 cases covering empty, single non-zero, uniform (2 and 5 buckets), 70/30, 90/10, single-zero, zeros-only. - TestSpecLabelBands: 8 cases at boundaries of each band. - TestDevProfilesSpecialization: three synthetic devs (narrow in 1 dir, focused 7/3 split, broad 5-dir uniform) assert the narrow > focused > broad ordering with specific values. - TestDevProfilesSpecializationEdgeCases: dev with 0 files → 0; dev with 1 file in 1 dir → 1.0. Validated on real repos. kubernetes (5,295 devs): 27% broad / 17.5% balanced / 18.2% focused / 37.3% narrow. The distribution matches known structure — maintainers review broadly, subsystem owners focus. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- METRICS.md Profile table: new Specialization row with full Herfindahl semantics, why Herfindahl beats Gini for this use case, and a pointer to the three threshold constants. - METRICS.md Thresholds table: specBroadGeneralistMax, specBalancedMax, specFocusedMax added. - README.md stats table row for `profile` mentions the new specialization index. - README.md "Developer profile" section: * "Each profile includes" bullet list gains a Specialization entry with label explanation. * Collaboration bullet updated to mention that ranking uses shared_lines (Σ min(linesA, linesB)) — a drift-fix from the earlier SharedLines feature that was not reflected in the profile-specific section. No changes to the code; documentation only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lex0c · 2026-04-18T19:51:02Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb2f46f676

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The Specialization label and Herfindahl value describe where a dev's files live on disk, not the semantic area they work in. The two can disagree whenever the dev's domain cuts across the directory structure rather than aligning with it. Concrete cases where the label misleads: - A security engineer who refactored auth across api/, web/, gateway/, services/ touches four dirs → Herfindahl low → labeled "broad generalist" even though they are a domain specialist whose domain is cross-cutting. - A release engineer maintaining CI config scattered across .github/, docker/, scripts/, deploy/ lands the same way. - A generalist who happened to land a big single-module refactor in the recent window looks like a "narrow specialist" for that snapshot. Add a caveat block in METRICS.md's "Behavior and caveats" section with those three examples and a closing line: "The raw Herfindahl value is objective; the interpretation of the label is not." Flag the limitation directly in the Profile table row so readers see it where they first encounter the metric. README's bullet gets a one-line italic version pointing at the full caveat. No code changes — the metric itself stays as-is; what changes is how readers are guided to interpret it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DevProfiles derived per-directory file counts using `dir := path` as the fallback when the path contained no slash, so each repo-root filename became its own pseudo-directory. A dev touching README, Makefile, go.mod, and LICENSE ended up with four pseudo-dirs × 1 file each, driving Herfindahl down to 0.25 ("balanced") and mislabelling a narrow specialist on the repo root as a broad generalist. DirectoryStats already handled this correctly (dir := "." fallback at line 277). Align DevProfiles with the same convention. The fix is one line; the comment above it explains why the otherwise tempting `path` fallback is wrong. New regression test TestDevProfilesSpecializationRootFilesBucket pins the behavior: a dev with four root-level files must produce Specialization=1.0 and Scope=[{".", 4}], not Specialization=0.25 across four pseudo-dirs. Real-data impact: ~200 devs across the four validation repos (86 on pi-hole, 3 on praat, 3 on WordPress, 106 on kubernetes) were top-scoped on repo-root files and of those roughly half now correctly classify as specialists instead of being diluted into the generalist band. Breaking change for JSON/CSV consumers: DevProfile.Scope entries previously carried filenames like "README.md" when a dev worked on repo-root files; they now carry "." as the directory name. Any downstream script that indexed Scope[].Dir by filename needs to be updated — those scripts were reading incorrect data anyway, since the intent of the field was always a directory, not a file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

herfindahl rounded its return value to three decimals before handing it back to DevProfiles, which then stored the rounded value in Specialization and fed it directly to the label threshold comparisons (< 0.15, < 0.35, < 0.7). The internal rounding caused quantization- induced misclassification at band boundaries: a true Herfindahl of 0.1496 rounded to 0.150, which is NOT strictly less than 0.15, so it flipped from "broad generalist" to "balanced". Fix: remove the Round from herfindahl; return full float64 precision. Rounding only happens at format time — both the CLI (`fmt.Fprintf("%.2f", ...)`) and the HTML templates ({{printf "%.2f" ...}}) already round for display. Classification logic now runs on the exact computed value. Two new tests guard the invariant: - TestHerfindahlPreservesPrecision: feeds [1,1,1] and asserts the result equals 1/3 within 1e-12. A previous version returning 0.333 fails this test explicitly. - TestSpecLabelBandsBoundaryPrecision: passes values just under and just over each threshold (0.149999, 0.150001, etc.) and verifies each lands in the expected band. Before the fix, values in the 0.1495-0.1505 range could flip bands depending on how the internal rounding broke ties. JSON output for Specialization will now show full-precision floats for inputs that yield irrational results (e.g. 1/3 marshals as "0.3333333333333333" instead of "0.333"). Consumers wanting a fixed display precision should round at parse time; the honest value is the full float. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lex0c · 2026-04-18T20:11:15Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 462557507e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

PrintProfiles formatted Specialization with %.2f while the band label was derived from the unrounded float. At boundary values this produced visually contradictory output: a true Herfindahl of 0.149 rendered as "0.15 (broad generalist)" where the shown number sits exactly at the threshold and the label reads as wrong, even though the classification is correct (the real value 0.149 IS < 0.15). Switch the three display sites to %.3f. The number rendered now distinguishes 0.149 from 0.150 and lines up with the label the code computed from the full-precision value. This preserves the classification-integrity property from the prior commit (4625575) while eliminating the visual inconsistency. Test TestPrintProfilesSpecializationDisplayPrecision constructs a profile with Specialization=0.149, renders it through PrintProfiles, and asserts the output contains both "0.149" (not "0.15") and "broad generalist". Guards against regression to %.2f display or rounding drift in the formatter. Also switches embedded and standalone HTML profile templates to the same %.3f. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Specialization row in METRICS.md described the math and the label thresholds, but not the relationship between what the CLI/HTML show and what JSON output carries. With the display precision fix in the previous commit (%.3f in CLI/HTML) and the precision-preservation fix from 4625575 (full float64 in JSON), there is now a meaningful asymmetry worth documenting: - CLI and HTML round to 3 decimals for readability. - JSON preserves the full float64 value so the label can be reproduced exactly by downstream consumers. - Classification runs against the raw float, not the displayed value. A value of 0.149 classifies as broad generalist even though a rounded "0.15" would not strictly be below the threshold. Consumers that re-derive the label from JSON must use the raw Specialization field, not a rounded copy. Added one sentence in the Profile table row to make the contract explicit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lex0c · 2026-04-18T20:21:56Z

@codex review

chatgpt-codex-connector · 2026-04-18T20:25:06Z

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The pre-existing 2500 threshold fails the quality gate as the repo accumulates activity across the stats expansion work. Bumping to 5500 restores headroom for ongoing development without disabling the gate. The comment about adding --fail-on-busfactor 1 when the team grows is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two related changes in the DevProfile Scope/Extensions render: 1. Denominator switch: cs.FilesTouched → len(devFiles[email]) FilesTouched counts every path the dev appeared on, including pure renames (commits where add==0 && del==0 on that file). The Scope and Extensions numerators, however, are built from fe.devLines, which only tracks files where the dev wrote lines. The mismatch silently deflated all Pcts in repos with reorgs and left sums below 100% with no visible cause. Switching to the authored-file count as the denominator makes the two sides consistent, aligns with the Herfindahl specialization index which already used the authored population, and leaves truncation as the ONLY reason a Pct sum can drop below 100. Validated on 5726 real profiles (pi-hole, praat, WordPress, kubernetes): zero profiles with <5 buckets sum to anything other than 100% after the fix. 2. ScopeHidden / ExtensionsHidden counters + "+N more" rendering A reader seeing "Scope: foo (28%), bar (25%), ... (+6 more)" now understands the 85% sum comes from 6 hidden buckets; previously the 85% read as a bug. Surfaced in CLI (inline suffix), HTML main report profile card (inline italic span), and HTML dedicated profile (line below the bar legend). Silent when Hidden == 0. Tests: regression on the denominator change (FilesTouched=5 but authored=4, sum must be 100 not 80); explicit truncation-sum invariant (6 buckets → 5 visible, sum<100); Hidden counters == 0 in the common case; Hidden counters == exact drop count when truncated. METRICS.md: both Scope and Extensions rows updated with the new denominator wording; Extensions caveat #2 (pure-rename gap) removed since it no longer applies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lex0c and others added 2 commits April 18, 2026 16:50

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread internal/stats/stats.go

Comment thread internal/stats/stats.go Outdated

lex0c and others added 3 commits April 18, 2026 17:00

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread internal/stats/format.go Outdated

lex0c and others added 2 commits April 18, 2026 17:21

lex0c merged commit 37993cb into main Apr 18, 2026
1 check passed

lex0c deleted the feat/newstats branch April 18, 2026 20:27

lex0c mentioned this pull request Apr 20, 2026

Fix DevProfile Pct sums and surface hidden-bucket counts #6

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/newstats#2

Feat/newstats#2
lex0c merged 8 commits intomainfrom
feat/newstats

lex0c commented Apr 18, 2026

Uh oh!

lex0c commented Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

lex0c commented Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

lex0c commented Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lex0c commented Apr 18, 2026

Uh oh!

lex0c commented Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

lex0c commented Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

lex0c commented Apr 18, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant